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The Invariance Principle for 
Dependent Random Variables 



By 

Patrick Paul Billingsley 
(Abstract) 

In this paper the Erdos-Kac invariance principle, as 
generalized by Donsker ( Mem . Am. Math. Soc . , no. 6 (1951), ) 
is extended to the dependent case. Let C be the space of 
functions continuous on the closed unit interval, with the uni- 
form topology. If { } is a sequence of random variables 

on a probability space ( -O. , (5^ , P), let p^ be that element 
of C which is linear on each of the intervals ( ( j - 1 ) n~\ j n ^ ) 

and satisfies p (0) = 0 and p (jn"*^) = X, + • • • + X. 

*n n ' 1 j for 

j = 1, . • . , n. Thus p^ is a (measurable) mapping of JTL into C , 
Suppose there exists a sequence ( a n ) positive constants 
such that if the measure P is defined by P (A) = P{a p £ A} 
for measurable subsets A of C , then {P^} converges weakly 
to Wiener measure. If this is true we say that the invariance 
principle holds for { } . Donsker has shown that the in- 
variance principle holds if { } is independent and stationary 

and Xj has zero mean and unit variance. In the present paper 
we prove, after some measure -theoretic preliminaries, that the 



invariance principle holds under each of the following condi- 
tions. (i) = f(x^), where f is a function on the state space 

of a discrete Markov process (x^} satisfying Doeblin*s con- 
dition. (ii) { } is m-dependent. (iii) { } is a discrete 

linear process with m-dependent residuals, (iv) is 1 or 0 

according as a recurrent event occurs or not at the nth of a 
sequence of trials. In each of these four cases the additional 
assumptions under which the invariance principle is proved 
are essentially those under which the corresponding central 
limit theorem has been proved. 
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§0. Introduction . 

In [9]* Erdos and Kac introduced a new method for proving 
weak limit theorems for functions of the partial sums of an inde- 
pendent sequence of random variables. Their method consisted 
in showing first that the limiting distribution is independent of 
the particular sequence and then computing this distribution for 
some convenient sequence. Let 

(0.1) x r x 2 , ... 

be an independent sequence of identically distributed random 
variables with zero means and unit variances. Let 

= Xj + ■ • * + X^ . Erdos and Kac showed that the limiting dis- 
tribution of 

(0.2) max S, 

k <n K 

(along with several other functions of S^, . . . , S n ) is independent 
of the distribution function common to the variables in the sequence 
{X n } . Since the limiting distribution of (0.2) in the Bernoulli 
case was well known, this argument gave the limiting distribution 



* Numbers in brackets refer to the bibliography at the end of the 
paper. The expression 11 Theorem i.j M refers to the jth theorem 
of §i f while M Theorem A. j M refers to the jth theorem of the ap- 
pendix. 
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of (0,2) under quite general conditions. There followed 
several papers (cf. , e.g., Erdos and Kac [10] and Mark [17]) 
in which this argument, known as the invariance principle , was 
applied to various other functions of the partial sums . All 
of these results were subsequently subsumed under a general 
theorem due to Donsker [6] . 

Donsker's result runs essentially as follows. Let C be 
the space of functions x (t) continuous on the closed unit interval, 
with the uniform topology, and let W be Wiener measure on C . 
Let {X n } be a sequence of random variables on some probability 
measure space </x. <B . P ) . Let p n be that element of C which 

is linear on each of the intervals ( (j - l)n“^, jn“* ) , j = 1 n, 

and satisfies p n (jn”*) = Sj for j = l and p n (0) = 0. That 

is, let p n be the random function defined by 

(0.3) p n (t) = S j _ 1 +(nt-j + l)Xj , (j -Dn^ltljn* 1 , j = l n, 

where Sq = 0 . Thus p n is a (measurable) mapping of XX into C . 
Let f be any function on C which is continuous in the uniform top- 
ology at almost all (W-measure) points of C . Donsker showed 
that if (0.1) is an independent sequence of random variables which 
are identically distributed with zero mean and unit variance, then 
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lim P{f(n" 1 / 2 p n )<a} = W{x: f(x)<a} 



at continuity points a of the function W{x: f(x)<a} . If, for 



example, f(x) = max x(t) , then f satisfies the above conditions 
and f(n p n ) = n”^ 2 max Si,., so that 

k<-h 



lim P{n 



- 1/2 



max Sj^^q} = w{x: max x(t)} =■< 
kSTi ostsi 



u 2 /2 



if a < 0 



0 



du if a > 0 , 



where the right hand equality can be established by any one of a 
number of methods. See [6] for other functions f which lead to 
interesting limit theorems. 

It should be pointed out that in place of the " random polygon" 
p n defined by (0.3) , Donsker actually worked with the "random 
step-function" with value Sj throughout the interval 
((j -l)n”^, jn“^] . There is of course no essential difference 
between the two methods. 

There is another way of stating Donsker's result. Let 
be the Borel field generated by the open (uniform topology) sub- 
sets of C . Suppose there exists a sequence {a n } of positive 
constants such that if P n is a measure defined by setting 
P n (A) = P{a^ p n £A} for A €: G , then P n converges weakly 
to W . When this is true we say that the invariance principle 
holds for the sequence {X n } with norming factors a n . 



Then 
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(cf. Theorem 1.1 below) Donsker's result says that the invariance 
principle holds, with norming factors , provided {X n } is an 

independent stationary sequence with E{Xj} = 0 and E|X j} = 1. 
The assumption that {X n } is stationary is relatively unimportant. 

It is the purpose of the present paper to replace the assumption of 
independence by various weaker hypotheses. 

The plan of the paper is as follows. The central results are 
contained in § § 4 through 7, those of § § 1 through 3 being prelim- 
inary. These first three sections are devoted to an account of the 
theory of weak convergence of probability measures on C ( § 1 ) , 
an alternative proof of the existence of Wiener measure ( § 2 ) and 
a general invariance principle ( § 3 ) . 

In § 3 we sort out those steps in the proof of Donsker's 
theorem (Theorem 1 of [6]) which depend upon the assumption 
of independence and state them as the hypotheses of Theorem 3.1 . 
This theorem then gives a set of conditions on the sequence {X n } 
which insures that the invariance principle holds with a suitable 
sequence of norming factors. While these conditions are not very 
pleasing, they can be verified for those sequences {X n } of great- 
est interest. 

Theorems 1.1 and 1.3 are preliminary to §3. Theorem 1.1 
gives several sets of conditions equivalent to weak convergence. 
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These conditions, with the possible exception of (ii), are well 
known. Theorem 1.3, on which Theorem 3.1 depends, gives 
a simple criterion for weak convergence of probability measures 
on G in terms of the convergence of the measures of sets of 
the form 

{ x: cij < x(t) < (3j , (j - ljc’ 1 < t < jc' 1 , j = 1 c} , 

where c is a positive integer and Qj , (3j arbitrary real numbers. 
This theorem is a slight generalization of one due to Donsker. 

The proof differs from his in that several arguments "of the 
Riemann approximation type" are eliminated, which elimination 
is made possible by condition (ii) of Theorem 1*1. Theorem 1.4, 
which is essential to the considerations of §4 , is the anologue for 
distributions on C of a well-known limit theorem for distributions 
on the real line. Its proof depends on condition (ii) of Theorem 1.1 . 

Theorem 1.2 and §2 are a side issue. Theorem 1.2 is a 
result, announced by Prohorov [19] , on the weak compactness of 
measures on C . Prohorov has used this theorem to give an 
elegant proof of the invariance principle in the independent case, 
but his method seems difficult to apply in those cases to which 
the present paper is devoted. His result is really an existence 
theorem, and in §3 we use it to prove the existence of Wiener 



- 6 - 



measure. This method of proving the existence of a stochastic 
process is of course not very general (cf. [8, Ch. II] for a gen- 
eral approach) but an alternative proof of this important theorem 
is interesting. 

In § §4 through 7 , Theorem 3.1 is specialized in various 
ways. In §4 the invariance principle is proved for sequences 
(f(x n )} , where f is a function defined on the state space of a 
discrete Markov process {x n } satisfying Doeblin^ hypothesis. 

The conditions under which this result is obtained are identical 
with those under which the central limit theorem for such processes 
is proved in [8] . 

In §5 we prove the invariance principle for m-dependent 
sequences of random variables. The best central limit theorems 
for such sequences are due to Marsaglia [18], and the conditions 
under which the theorems of §5 are proved are essentially those 
of his central limit theorems. Donsker^ original theorem follows 
from the results of §5 . 

§6 treats of discrete linear processes with m-dependent 
residuals, processes which arise in the analysis of time series. 
Here we prove the invariance principle under conditions only 
slightly stronger than those assumed by Diananda [5] in his proof 
of the central limit theorem for such processes. 
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Finally, in §7 we prove the invariance principle for the 
number of occurences of a recurrent event. Here we assume 
that the recurrence time has a finite second moment. 

In the appendix we prove several limit theorems for c- 
dimensional distribution functions. These theorems are all 
routine extensions of results well known for the case c = 1 . 

It is doubtful that the results of §§4 through 7 can be 
substantially improved using present methods, since in each 
case the invariance principle is proved under conditions virtually 
the same as those under which the central limit theorem has been 
proved. It is possible to prove the invariance principle in cases 
other than those considered here. One can, for example, prove 
it for martingales, as Levy [16] has the central limit theorem, 
or under the assumptions of Bernstein’s "lemme fondamental" 

[2] . Although no applications have been essayed, the cases 
treated in § § 4 through 7 are those of greatest interest for the 
applications . 
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§1. Convergence of measures on the space of continuous functions . 



In this section we prove two useful theorems on the convergence 
of probability measures on the space of continuous functions. 

Consider first an arbitrary metric space with metric . 

In what follows we will be interested in the cases in which X is 
either the space of continuous functions or a Euclidean space. Let 
© be the collection of Borel sets, that is, the Borel field gener- 
ated by the open sets. If P n , P are probability measures on ( 3 . 
we say that P n converges weakly to P (in symbols, P n P) if 



^ f dP n J £ dP 

for all bounded continuous functions f . 

Theorem 1.1 gives several convenient sets of conditions 
equivalent to weak convergence. For its proof we require the 
following variation on Urysohn's lemma. 

Lemma 1. 1 . If A and B are closed sets with ^ (A f B ) > 0 , 
then there exists a function f(x) which is 1 on A # 0 on B , 
everywhere between 0 and 1 , and uniformly continuous on . 

Proof : We may of course assume that A and B are non-empty. 



V/ith the exception of uniform continuity, it is clear that the function 
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f ( \ - L <*• B) 

' x; " f(x, B)+p(x, A) 

has the required properties. To prove that f is uniformly 
continuous observe first that (cf. [l, p.57]) 

I p (x, A) - j3( y, A)| < p (x, y) 

and 

p (x, A) + i? (x # B) > j? ( A t B) . 

From these inequalities it follows that 

| f(x) - t( y )| <l ^<x. B )-^(y,B)| 
p{x, B) +/3(x. A) 

+ f>{ y, B) j B) + p (x. A) p (y , B) + p(y , A) i 

< fey} + £±b£l 1 P (y • B ) - P (_x_. B )|_+ J Ply, A) -p (x , A )] 

- fWTB) p(y, B) + p(y , A) p (x, B) + p (x, A) 

? pb> / 7(x - y) - 

Hence f is uniformly continuous on . 

In what follows we denote the closure, interior and boundary 

— o ^ 

of a set A by A, A and A, respectively. If P is a probability 
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measure on and f is a measurable function then P{x: f(x)<a} 



is a function of a which we call the P-distr ibution function of f 



Theorem 1. 1 . The following statements are equivalent 



(i) P n n>P 

(ii) j* f dP n 




f dP for bounded uniformly continuous 



functions f 



(iii) P(A)>lim sup P^A) for closed sets A . 

Pi — on 

(iv) P(A) = lim P n (A) f° r sets A ^ (0 such that P(A) = 0 . 



(v) For any function f which is continuous except on a set of 

P-measure zero, the P n -distribution function of f converges 
to the P-dis tribution function of f at each continuity point of 
the latter. 

Proof . We will prove in turn the implications (i)— ► (ii) — >(iii) — * 
(iv)-^(v) — *(i) • The implication (i) — *(ii) is trivial. 

(ii ) — (iii ) . Suppose that A is closed and E > 0 given. We 
may assume that A is neither the empty set nor the whole space. 
For $ > 0 let = {x: p (x, A) < S } . Then is open 

and U s 1 A as S | 0 , 3ince A is closed. Hence there exists 



a S such that P(U^ - A) < £ . Clearly /o (A , ) > $ > 0 . 
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Therefore, by Lemma 1.1, there exists 
function f which is 1 on A , 0 on - 

between 0 and 1. Now 



I 




f dP 



by (ii ) , and 



while 



j' fdP n >P n (A). 
1 96 



uniformly continuous 
and everywhere 



I 

' 
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f dP < P(A) + P(U- A) < P(A) + £ . 

From these three relations it follows that 

lim sup P (A) < P(A) + € . 
n — oo n 

Since £ is arbitrary, ( ii i ) follows. 

(iii ) — > (iv) . Suppose that P(A) = 0 . Then 

(1.1) P(A) = P(A) > lim sup P (A) > lim sup P (A) . 

n — oo n n — oo 

Since the boundary of 96 — A also has P-measure zero we have in 
the same way, 

(1.2) P(3G - A) > lim sup P n (* - A) . 

n — oo 

But (1.1) and (1.2) imply 

P(A) = lim P n (A ) . 

h — ► oo 

(iv ) — Mv)« Let F^ and F be respectively the P n -distribution 
functions and the P-dis tr ibution function of f and let A be the 
set of points at which f is discontinuous. Then 

{ x: f (x ) < a} CL { x: f(x) < a} w A 

and 



{x: f(x) < a) - A C (x: f(x) < a} , 
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so that the boundary of { x: f(x) < a} is contained in { x; f(x) = a} A 
Since P(A)=0, F(a) = F(a-0) implies that the boundary of 
{ xt f(x) < a} has P-measure zero, and. hence that F n (a) — F(a) . 

(v) — >• (i ) . Let F n and F be the P n -distribution functions and 
the P -distribution function of the bounded continuous function f. 

We assume that F n (a)-»'F(a) if F is continuous at a and must 
show that 



where M is the bound of f . But (1.3) is easy to establish (cf. 
[3, p. 74]). This completes the proof of Theorem 1.1. 

We note at this point the well-known fact that if is 
c -dimens ional Euclidean space then equivalent to each of the con- 
ditions of Theorem 1.1 is the condition that if F and F are the 

n 

distribution functions corresponding to P n and P respectively, 
then 



36 



f dP 



n 




dP . 



But this last statement is equivalent to 




h — ► 



lim 



CO 




• • • # 



a c ) = F < a i- 



• • • » 




at each point (a^, 



• • • 0 



a c ) such that 
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F(a 1> .... Q c ) = sup F(P 1 p c ) , 

i.e. f at continuity points of F , 

Let C be the space of functions x(t) continuous on the 
closed unit interval, with the metric 

f> (x ( y ) = sup | x(t) -y (t)| . 

* o<t<1 

Then C is a complete, separable metric space. Let be the 

collection of Borel sets. The Borel field is generated by the 

sets of the form {x: x(t) < a} . That such sets belong to is 
obvious, and to see that they generate t it is enough to observe 
that 

{x: ^(x,x Q )< S } = 0 (x: | x(r ) -x Q (r )| < S } . 

where the intersection extends over all rationals r in the unit 
interval. 

If ty t^ are fixed points in the closed unit interval, 

[x(tj), x(t^)] defines, as x varies over C, a k-dimensional 

random vector on C which we denote [x x ] , 

l l C k 

The first of the two theorems concerning the convergence of 



probability measures on C which we will need is due to Prohorov 

[ 19 ] . 
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Theorem L 2 . Suppose that {P n } is a sequence of probability- 

measures on with the property that for each € > 0 there 

exists a compact set K £ such that P n (K £ ) > 1- £ for all n * 

Then there exists a sequence {n^ } and a probability measure 

P such that P P as V— . 

n v 

Proof . Let {r^* k > 1} be an ordering of the rationale of the 
closed unit interval. For each n and k 

(L 4) n ^ ~ ^n 9 •••* x r J ^ 

* 1 k 

is a probability measure of k -dimensional Borel sets S. Let 
n be the distribution function corresponding to n • For 

each k it is possible by Helly’s theorem to find an increasing 
sequence fn^ } of integers and a function F^.(a^, . .., a c ) such 
that F^ is everywhere between 0 and 1, is non -deer easing in 
each variable , is continuous from above and 

0.5) lim F kn \) = F k < a i a k ) 

at continuity points of F^ . By the diagonal method it is possible 
to choose a single sequence {n y } so that (1.5) holds for all k 
simultaneously. 

For a fixed k let 

S r {[x(r^), • • • i 



x(r k )] : x e K £ } . 
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Then S is a k -dimensional Borel set with 

lY.n'V* 1 - 6 

for all n . Since is compact, S € is bounded. From these 

facts it follows that must have total variation 1, i.e., that it 

is the distribution function corresponding to some probability 
measure y^ . And from the remark following the proof of 
Theorem 1. 1 we conclude that 

d.6) => y k ) 

for all k • 

We now use the measures to set up a measure on C 

in a way similar to that used by Kolmogorov [l 5 ] in his fundamental 
existence theorem. His theorem as such is not applicable here since 
we are working in the space of continuous functions rather than the 
space of all functions. Let be the (finitely additive) field of sets 
of the form 

0. 7) A = {x: [ x(r 1 ) x(r k )] e s ) . 

where k is any integer and S is a k -dimensional Borel set. For 
such a set A put P(A) = p^(S) • Since there are other represen- 

tations of A # we must show that this definition is consistent. 

Suppose then that in addition to (1.7) we have 
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(1.8) A = { x: [xfr^ x(r^)] €. S'} , 

where j > k and S' is a j -dimensional Borel set. From the fact 
that for any point ( ^ • • • • j^) of k-space there exists an' 

x ^ C with [x(r^), . . . , x(r^)] = ( if . . . 9 if y) it follows that 
S' = { {^y • • . i . ): ( ^ y * * * 9 ^ ^ S } . From this and 

the definition (1.4) we have 



f*k,n (S) " Pj.n* ( ^l ^ v) € S * 



for all n . Hence by (1. 6) 



(1. 9 ) p k (s) = p. {<*\ < k )es}, 

provided the boundary of S has p ^-measure zero. But this 
clearly implies (1.9) for all S f which establishes the consistency 
of the definition of P(A) . 

It is easy to show that P is a finitely additive measure on 
and that P(C) = 1 . We now prove that P is completely addi- 
tive on ^ . Suppose then that {A k ) is a non-increasing sequence 
of -p* sets with P(A k ) > L > 0 for all k . We will show that the 
A^ have a non-empty intersection. For notational convenience we 
assume that A k is defined in terms of the first k of the {r^} • 



Aj^ — { x. [x(r j ) ( . • • » 



x(r k )] G S k } . 



